Evolutionary Clustering Algorithm with Knowledge-Based Evaluation for Fuzzy Cluster Analysis of Gene Expression Profiles
نویسندگان
چکیده
Clustering method, which groups thousands of genes by their similarities of expression levels, has been used for identifying unknown functions of genes. Fuzzy clustering method that is one category of clustering assigns one sample to multiple groups according to their membership degrees. It is more appropriate than hard clustering algorithms for analyzing gene expression profiles since single gene might involve multiple genetic functions. However, general clustering methods have problems that they are sensitive to initialization and can be trapped into local optima. To solve the problems, we propose an evolutionary fuzzy clustering algorithm with knowledge-based evaluation. It uses a genetic algorithm for clustering and prior knowledge of data for evaluation. Yeast cell-cycle dataset has been used for experiments to show the usefulness of the proposed method. 1 Evolutionary Fuzzy Clustering with Knowledge-Based Evaluation General clustering algorithms have common problems that they are very sensitive to initial values and they can be trapped by local optima since their processes are supposed to minimize objective function [1]. Besides, there is a problem of evaluating cluster results. Since gene expression profiles vary depending on their characteristics and environments that they were collected, it is not appropriate to evaluate them with the same criteria. We propose an evolutionary fuzzy clustering and knowledge-based evaluation method to solve the problems. GA (genetic algorithm) that is an efficient method to solve optimization problem is applied for the evolutionary fuzzy clustering method. There have been many publications related to evolutionary computation for clustering. Maulik and Bandyopadhyay tried to minimize the distances between the data in the same clusters and cluster centers [1], and Hall used GA to minimize objective function value of the hard and fuzzy c-means algorithms. However, they fixed the number of clusters and used GA only for the minimization of objective function. We have encoded one cluster partition of variable number of clusters as one chromosome and formed various cluster partitions. Evolutionary Clustering Algorithm with Knowledge-Based Evaluation 641 The proposed method is divided into two parts: an evolutionary clustering part, which searches optimal cluster partition using GA, and a knowledge-based evaluation part, which obtains the optimal α -cuts from several datasets for Bayesian validation (BV) method. Fuzzy c-means algorithm known as the most widely used fuzzy clustering method is used for clustering [2]. For knowledge-based evaluation, we have used BV and decision tree (DT) rule to decide the optimal α -cut value. Original BV [3] evaluates cluster partition with the same α -cuts for all datasets, but it cannot evaluate the cluster results correctly since each dataset has different distribution and they are extracted from different environments. We have obtained α -cut value for each dataset using the DT rule. First, N gene expression profiles are clustered using the fuzzy c-means algorithm, and the results are evaluated by BV. Subsequently, the optimal α -cut for each dataset is decided, and they are used for the labels of DT training data. Rule production process trains DT, and produces rules. As Fig. 1 illustrates, the attributes of DT training data are produced using membership matrices that are the fuzzy clustering results of each dataset. Incrementing the membership degree value from 0.0 to 1.0 with the difference of 0.1, attributes are divided into 10 sections. Each section counts the frequency of samples and calculates the attribute by dividing the frequency by the total number of samples. These attributes calculated are A1~A10.
منابع مشابه
Evolutionary Fuzzy Clustering Algorithm with Knowledge-Based Evaluation and Applications for Gene Expression Profiling
In microarray data analysis, clustering is a method that groups thousands of genes by their similarities of expression levels, helping to analyze gene expression profiles. This method has been used for identifying unknown functions of genes. The fuzzy clustering method assigns one sample to multiple groups according to their degrees of membership. This method is more appropriate for analyzing g...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملEvolutionary fuzzy cluster analysis with Bayesian validation of gene expression profiles
Clustering analysis of the gene expression profiles has been used for identifying the functions of unknown genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple clusters as their degrees of membership. It is more appropriate for analyzing gene expression profiles because genes usually belong to multiple functional families. However, general clusteri...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملFuzzy c-means clustering with prior biological knowledge
We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005